AITopics | top-k operation

Supplementary for Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity 1 Further Results of the impact of sparsity on Shape Bias Benchmark

Neural Information Processing SystemsApr-30-2026, 02:18:01 GMT

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. For ViT, we also apply the spatial Top-K operation as described in the general response. We can observe an increase in both ResNet-50 and ViT-B architectures, furthering closing the gap between human and existing models. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). The ResNet-50's sparsity definition is the same as AlexNet and VGG. For ViT-B, we reshape the intermediate activation response from [n, h * w, d] to [n, d, h * w] and apply the Top-K selection over dimension 2 before the activation is passed through the multiple head attention (Note that h and w is the height and weight of the latent tensor after reshape it to 2d, for ViT-B with patch size 16 on the 224x224 images, h=w=14, n denotes the batch size).

artificial intelligence, machine learning, sparsity, (16 more...)

Neural Information Processing Systems

Country: Europe > Switzerland (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

SupplementaryforEmergenceofShapeBiasin ConvolutionalNeuralNetworksthroughActivation Sparsity 1 FurtherResultsoftheimpactofsparsityonShapeBiasBenchmark

Neural Information Processing SystemsFeb-17-2026, 15:18:51 GMT

We utilize the sparsity operation proposed in Section 3.1 for ResNet-50. We generalize section 4.2 in the main text to ResNet-50 and ViT-B architectures (Figure 1). We apply the Sparsity layer in a subset of the network. It is based on the intuition that the brain utilizes sparsity for long range communication butcan allowlocal dense computation. Wedivide thenetworks into chunks where within each chunk theneuron'sactivities areallowed tobedense (keep original) but the communication across different chunks is set to be sparse.

artificial intelligence, crazy-jack nips2023, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Switzerland > Zürich > Zürich (0.15)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity

Neural Information Processing SystemsFeb-17-2026, 15:18:48 GMT

For object recognition convolutional neural networks, the shape bias leads to greater robustness against style and pattern change distraction.

artificial intelligence, machine learning, shape bias, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ec24a54d62ce57ba93a531b460fa8d18-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 23:47:13 GMT

soft top-k operator, top-k operation, top-k operator, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Differentiable Top-k with Optimal Transport

Neural Information Processing SystemsDec-24-2025, 20:33:21 GMT

Finding the k largest or smallest elements from a collection of scores, i.e., top-k operation, is an important model component widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulted model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT top-k operator approximates the output of top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efficiently approximated based on the optimality conditions of EOT problem. We then apply the proposed operator to k-nearest neighbors algorithm and beam search algorithm. The numerical experiment demonstrates their achieve improved performance.

differentiable top-k, name change, optimal transport, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.97)

Add feedback

Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity

Neural Information Processing SystemsOct-9-2025, 09:55:28 GMT

For object recognition convolutional neural networks, the shape bias leads to greater robustness against style and pattern change distraction.

artificial intelligence, machine learning, shape bias, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Differentiable Top-k with Optimal Transport Y ujia Xie College of Computing Georgia Tech

Neural Information Processing SystemsAug-17-2025, 03:43:39 GMT

Work done in a Google internship.

soft top-k operator, top-k operation, top-k operator, (11 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Differentiable Top-k with Optimal Transport

Neural Information Processing SystemsJan-14-2025, 02:15:00 GMT

Finding the k largest or smallest elements from a collection of scores, i.e., top-k operation, is an important model component widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulted model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator.

differentiable top-k, optimal transport, top-k operation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.41)

Add feedback

Differentiable Top-k Operator with Optimal Transport

Xie, Yujia, Dai, Hanjun, Chen, Minshuo, Dai, Bo, Zhao, Tuo, Zha, Hongyuan, Wei, Wei, Pfister, Tomas

arXiv.org Machine LearningFeb-18-2020

The top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component, which is widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efficiently approximated based on the optimality conditions of EOT problem. We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.

soft top-k operator, top-k operator, training procedure, (12 more...)

arXiv.org Machine Learning

2002.06504

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)

Add feedback

Sparse Weight Activation Training

Raihan, Md Aamir, Aamodt, Tor M.

arXiv.org Machine LearningJan-7-2020

They have an indexing unit for enabling the sparse multiplication. The computations are spatially mapped and scheduled to these processing units by a control and scheduling logic. Each of the PE generates partial products which get accumulated to compute the output values and finally stored in the DRAM. Mapping Computations: Let us consider a convolutional layer, which maps the input activations in ( R N C H I W I) to out ( R N F H O W O). The layer computes F channels of output feature maps, each of dimension R H O W O, using C channel of input feature maps of dimension R H I W I for each of the N samples in the mini-batch. The layer has parameter w R F C H K W K . Algorithm 1: Dense Forward Pass Computation for a single input sample (Assuming Stride 1)The data: w,in The result: out for h o 1 to H O do for w o 1 to W O do for f 1 to F do for c 1 to C do for h k 1 to H K do for w k 1 to W K do c c; h h o h k; w w o w k; out[f ][h o][w o] w [ f ][c ][h k][w k] in [c ][h ][w ]); end end end end end end Thus, as shown in algorithm 1, each activation is reused F C H K W K times, each weight is reused N C H K W K times and the total computation is as follow: Dense Convolution FLOP F H O W O C H K W K (7) The first three'for' loops are independent and can be mapped independently to the PEs, whereas the inner three'for' loop generate the partial products. The different sparse accelerators have different ways of mapping the'for' loops spatially over the PEs for maximizing reuse and minimizing the data transfer to and from the DRAM.

activation, algorithm, neural network, (15 more...)

arXiv.org Machine Learning

2001.01969

Country:

North America > Canada > British Columbia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Filters

Collaborating Authors

top-k operation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Supplementary for Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity 1 Further Results of the impact of sparsity on Shape Bias Benchmark

SupplementaryforEmergenceofShapeBiasin ConvolutionalNeuralNetworksthroughActivation Sparsity 1 FurtherResultsoftheimpactofsparsityonShapeBiasBenchmark

Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity

ec24a54d62ce57ba93a531b460fa8d18-Paper.pdf

Differentiable Top-k with Optimal Transport

Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity

Differentiable Top-k with Optimal Transport Y ujia Xie College of Computing Georgia Tech

Differentiable Top-k with Optimal Transport

Differentiable Top-k Operator with Optimal Transport

Sparse Weight Activation Training